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Given a data set (ti,yi), i = 1, ... ,n with the t, G [0, 1] non-parametric 
regression is concerned with the problem of specifying a suitable function 
f n : [0, 1] — * M such that the data can be reasonably approximated by the 
points (ti, f n (ti)), i = 1, . . . , n. If a data set exhibits large variations in local 
behaviour, for example large peaks as in spectroscopy data, then the method 
must be able to adapt to the local changes in smoothness. Whilst many meth- 
ods are able to accomplish this they are less successful at adapting derivatives. 
In this paper we show how the goal of local adaptivity of the function and its 
first and second derivatives can be attained in a simple manner using weighted 
smoothing splines. A residual based concept of approximation is used which 
forces local adaptivity of the regression function together with a global reg- 
ularization which makes the function as smooth as possible subject to the 
approximation constraints. 
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1 Introduction 



1.1 Smoothing and weighted smoothing splines 

In the one-dimensional case nonparametric regression is concerned with determining 
a function /„ : [0, 1] — ► R which adequately represents a data set y n = {(ti,y(ti)) : 
U G [0, 1], i = l,...,n}. The problem is to provide a function /„ which is an 
adequate representation of the data. One well established method for accomplishing 
this goal is that of smoothing splines defined as the solution of the problem 

n „1 

minS(g,X)--=y2(y(U)-g(U)) 2 + X g^(t) 2 dt (1) 



Ruppert et al 



where A is the smooth ing parameter (see lWahbal (1l990l ): lGreen and Silvermanl (119941 ); 



(120031 )). This approach has two weaknesses. The first is that there 
may not be any choice of A for which the resulting fit is satisfactory. This is partic- 
ularly the case if the data show large local variations such as in Figure [T] which are 
taken from thin film physics. They were kindly supplied by Prof. Dieter Mergel of 
the Department of Physics, University of Duisburg-Essen. X-rays are beamed onto 
a thin film and the data give the photon count of the diffracted rays as a function of 
the angle of diffraction. The sample size is n = 7001. The high peaks can only be ad- 
equately captured with a small value of A in (CQ). This has however the consequence 
that the function oscillates too rapidly between the peaks. The second problem is 
to give an automatic choice for A. Methods suggested include cross-validation, gen- 



eralized cr oss-validation, generalized maximum like 



likelihood (ICraven and Wahbal (119781 ): IWahbal (I1985T ): iRuppert et all (120031 )1 How 



ihood and restricted maximum 



ever it is clear that if there is no satisfactory value of A then no automatic choice 
will work. 
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Figure 1: Data from thin- film physics showing the photon count of X-rays as a function of the 
angle of diffraction measured in degrees. 
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In this paper we attain more flexibility by considering a vector A = (Ai, . . . , A n ) 
rather than a single value A and we replace the minimization problem (pQ) by 



min S(g, A) := V Uy{U) ~ 9(U)) 2 + t 9 {2 \t? dt. 
i=i Jo 



(2) 



Comparing this with (CD) we see that the smoothing parameter A has now been 
transferred from the penalty term to the observations them selves. The solution, 



Green and Silverman 



which we denote by /„(• : A), is a natural cubic spline (see 
( 119941 )) but the Aj now control the fit at the observation points (U, y(ti)) rather than 
the size of the penalty which is now fixed. In the case of the data displayed in Figure 
CD we would choose large values of Aj at the peaks causing them to be adequately 
approximated. At points away from the peaks we would choose the Aj to be small 
and thus ensure a smooth solution at these points. 

The method proposed here belongs to the category of s patially adaptive s plines . 



For other spa t ially 



Denison et al 



998) 



a daptive spline methods we refer to 



Ruppert and Carroll (2000), 



f 2001 ). IPittmanl (120021 ) . 

(hooq i. 



Wood et al. 



(12002 



Luo and Wahba ( 



Zhou and Shenl (200ll) . 



Mivata and Shenl (|2003 



1997) 



. 3iMatteo et al. 



2Q05| ) 



Pintore et al 



1.2 Contents 

In Section 2 we describe an approach to choosing a model in the context of non- 
parametric regression which is based on a universal, honest and non-asymptotic 
confidence region. Section 3 shows how the ideas of Section 2 can be adapted to 
give a simple method for choosing the weights of a weighted smoothing spline. Ex- 
amples and the results of a small simulation study are given in Section 4. Section 
5 gives two variations on this theme and Section 6 extends the method to image 
analysis. Finally in Section 7 we look at the asymptotics. 
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2 Choosing a model 



2.1 Nonparametric confidence regions 

A lot of work has been devoted to choosing a model from a sequence of models of 
increasing complexity. Choosing a value of A in ([T]) falls into this category as the 
smaller A the more complex the resulting smoothing spline. Methods developed to 
solve the problem include cross-validation, plug-in methods as well as AIC and BIC 
which are explicitly phrased in terms of bal ancing complexity and fid elity. We take a 
different app r oach h ere which is implicit in iDavies and Kovad (120041 ) and explicit in 



Davies et al. 



(l2008bl ). We define a universal, honest and non-asymptotic confidence 
region A n and given this region we choose non- decreasing X\, j = 1, 2, . . . to force 
f n (- : A J ) to eventually lie in A n . This gives a sequence of functions of increasing 
roughness (or complexity) and we choose the first one which lies in A n . The region 
A n is based on the residuals and requires a stochastic model. The one we use is 



Y(t) = f(t) + aZ(t), < t < 1 



(3) 



with Z(t) standard Gaussian white noise. Following IDavies et al.l (j2008bl ) A n is 
defined as follows. For any function g we consider normalized sums of residuals over 
intervals 



w( 



y n Ji9) = -4= ^2{y(ti) - g{U)) 

y\ 1 \ uei 



(4) 



where |/| denotes the number of points £j in the interval I. For data Y n = Y n (f) 
generated under the model (El) we define the confidence region for / by 



An = A n (Y n ,a,I n ,r n ) = [g : max \w(Y n ,I,g)\ < a^/^~^ogn} 
where I n is a family of intervals and r n = r n (a) is defined by 



P ( max 



i€l 



< A/r n logn 



(5) 



5 



P.L. Davies and M. Meise 



n 


100 250 500 1000 2500 5000 10000 


0.95 
0.99 


2.92 2.88 2.79 2.71 2.64 2.60 2.55 
3.60 3.41 3.33 3.17 3.03 3.00 2.92 



Table 1: The values of r„(a) for the dyadic scheme X n , a — 0.95 and 0.99 and n = 
100, 250, 500, 1000, 2500, 5000 and 10000. 



It follows that A n (Y n , a,X n , r n ) is a universal, honest and non-asymptotic confidence 
region for /, that is 

P(f e A n (Y n (f),a,I n ,T n )) = a for all / and n. 

The family of intervals X n can be taken to be the family of all intervals but this 
is computationally expensive. For all practical purposes it suffices to consider a 
subfamily of intervals as long as it is multiscale, that is, if it contains intervals of 
all sizes. The simplest such scheme, and the one we shall use, corresponds closely 
to that defined by the Haar wavelet. If n = 2 m the family X n consists of all one- 
point intervals [tj, £i] . . . , [t n , t n ], all two point intervals [ti, t 2 ), [£3, £4], . . . , [t n -i> tn}, 
all four-point intervals [£1,^4], [£5,£s], ■ ■ ■ , [^-3,^] and so forth. If n is not a power 
of 2 we simply include the last interval whatever its form. In the remainder of the 
paper we use this dyadic scheme. For any scheme X n and for given a the values of 
r n (a) as defined by (jSJ) can be obtained by simulations. Table Q] gives the values 
of T n (ce) for the dyadic scheme just described, a = 0.95 and 0.99 and for various 
sample sizes n. The results are based on 10000 simulations. 

It follows from a result of iDumbgen and Spokoinyl (120011 ) and the very precise 



result of 



Kabluchkol (120071 ) that if X n contains all one-point intervals then 

lim r n (a) = 2 



for all a. In particular this holds for the dyadic multiscale family X n we consider. 
The resulting curves are not sensitive to the value of r n (a) and so for simplicity in 
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the remainder of the paper we simply put r n (a) = 3. This is consistent with the 
values of Table [TJ 

An Associate Editor asked to what extent the results depend on the chosen 
scheme X n and the value of r n . This can be analysed as follows. Suppose the data 
are generated by a function / and consider a function /„ which differs from / by 
5 n on an interval /, that is f n (t) — fit) > 5 n , t G I. This will be detected by the 
procedure if f n ^ A n . If X n is the family of all intervals then f n ^ A n follows from 



7f=f X^fo) - ^ W^logn. 



From this we deduce that the deviation will be detected with probability at least 
a -0.01 if 

S n > o (V r n log n + 2.3263) fy/\T\. (6) 



If we use the dyadic scheme X' n it is no longer guaranteed that I G T' n . However there 
exists an interval I' C I in X' n with |/'| > |/|/2. The same argument gives 



S n > ( rv / 2(v /r n 1 °g n + 2 - 3263 )/ 



(7) 



Denser schemes X „(k) paramet e rized 



Davies et al. 



)y a parameter k, 1 < k < 2, with \X n {K) 



(j2008bl ): the dyadic scheme corresponds to the case 



0(n) are given in 
k = 2. If we use X n (n) then we can replace j7j) by 

<5 n > av / ^(v /r n( K ) logn + 2.3263)/v^T- 

As r n (/c) < r n this can be made arbitrarily close to the case of all intervals (EJ). 
The dyadic sche me is the coarsest we u se, but it is nevertheless efficacious as shown 

(j2008al ). The analysis we have done is for a worst- 



by the results of 



Davies et al. 



case situation, the actual performance may be better. As an example we take 
a = 0.95, cr = 1 and n = 1000. It follows from Table [T] that the value of r' n in (J7|) is 
2.71. Simulations show that the corresponding value of r n in ([6]) is 2.91. If |/| = 24 
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Figure 2: The upper panel shows standard white noise Z(t). The lower panel shows 
f{t) + Z(t) with f n (t) = 0.7 for 0.5 < t < 0.524 and zero otherwise. 

then we have 5 n > 1.39 for ([6]) and 5 n > 1.92 for (JTj). The upper panel of Figure [2] 
shows standard white noise / = 0: the lower panel shows f n (t) + Z(t) for the same 
noise with f(t) = 5 for 0.5 < t < 0.524, zero otherwise and 5 = 0.7. The signal in the 
lower panel is difficult to detect by eye: the signal-to-noise ratio is 0.11. However it 
is detected using the dyadic scheme with r n = 3. If we put r n = 2.71 then a signal 
with 5 = 0.63 is detected. If we use all intervals then with 7~ n — 3 cL SI gnal with 
5 = 0.67 is detected, for r n = 2.91 a signal with 5 = 0.64 is detected. The differences 
are not large. 

So far we have assumed that a is known which is not the case. We use the 



default value (jDavies and Kovad (120011 )) 



1.4826 



MED{| 3 /(t i )-3/(t i -i)|,i = 2,...,n-l}. 



For data generated under the model we have 



Y(ti) - y(ti_i) = z{u) - z(u-i) + f{u) - f{u-i 
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If Z is a N(0, 1) random variable it may be checked that the median of \Z — c\ 
strictly exceeds that of \Z\ for any c ^ 0. From this it follows that a n is always 
biased upwards under the model. Consequently A n (Y n ,a n ,X n ,r n ) is no longer a 



univer sal, exact, non- asymptotic confidence region but it is a universal, honest (]Li 



(119891 ) ) . non-asymptotic confidence region 

P(f G A n {Y n , a n ,l n , r n )) > a for all / and n. 
Given the confidence region A n (y n , a n ,X n , r n ) and the measure of roughness 

R{g) = f g^{tfdt 



o 



the natural approach would be to solve 

minimize R(g) subject to g <E A n (y n ,a n ,2 n ,T n ). (9) 

As A n is defined by a set of linear inequalities involving the values of g at the points 
ti the problem is one of quadratic programming. If we take the dyadic scheme for 
I n then A n is defined by about An linear inequalities. For small data sets with 
n < 1000 which exhibit little local variability it is possible to solve this directly 
but the approach fails for data sets such as those of Figure [T] with n = 7001. 
The quadratic programming problem involves 7001 parameters and the number of 
linear constraints is about 28000. Furthermore the fact that the squared second 
derivative varies by several orders of magnitude over the interval causes excessive 
numerical instability. In contrast the problem can be solved for in a fast and 
stable manner even for values of Aj which differ by orders of magnitude. In the 
next section we describe an automatic procedure for doing this which attempts to 
emulate the solution of @. 

he idea of the confidence region as defined above is implicit in 



Davies and Kovac 



(120011 ) . A similar idea was used by Diimbgen and Spokoiny (2001) for testing 
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for monotonicity and convexity of nonparametric functions. Universal, exact, non- 
asymptotic confidence regions based on the signs of the residuals sjgrujj ( t^ — g (tj)) 



rather than the residuals themselve s are to be found implicitly in 



explicitly in 



Dumbgenl 12003 



Daviesl (119951 ) and 



20071 ) and lDiimbgen and Johns! (120041 ) . These require 



only that under the model the errors are independently distributed with median 
zero. As a consequence they do not require an auxiliary estimate of scale such as 



3 Choosing the weights 
3.1 The procedure 

The procedure we use is based on the following heuristic. If ||A|| is small then the 
solution / n (- : A) of (j2j) will be essentially the least squares line through the data. 
If on the other hand all the components \ of A are very large then /„(• : A) will 
almost interpolate that data and will lie in A n as all residuals will be close to zero. 
The idea is then to start with very small \ and then to increase them gradually 
until f n (- : A) lies in A n and then stop. More formally we start with the least 
squares regression line and check whether this lies in A n . If so we stop and accept 
the solution. Otherwise put A 1 = (Ai, . . . , Ai) where Ai is chosen to be so small that 
the solution of ([2]) with A = A 1 differs from the least squares lines by some small 
prescribed quantity. At the ith stage we have the solution f n (- : X 1 ) based on the 
weights A*. We check if the solution lies in A n and if so we stop. If not we determine 
those intervals Jj G I n for which 

w(.V n , h fn(- ■ A 1 )) > a n ^T n \ogn. (10) 

For all points tj in any such interval we increase the corresponding A*- by a factor of 
q, that is A* +1 = q\y Our default value for q is 2. The remaining A* are not altered. 
This gives us a new A* +1 and we repeat the procedure. 
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As defined the procedure is difficult to analyse, especially as the effect is a 
finite sample one: it will gradually disappear for a fixed function / as the sample 
size n tends to infinity. The problem can be circumnavigated to a certain extent as 
follows. We consider a second procedure but this time with the components A* of A* 
all equal, A*- = A\ j — 1, . . . , n. If the solution does not lie in A n then all components 
are increased by a factor of q and not just those whose tj values lie in intervals Ii for 
which (II 01) holds. For this form of A = (A, . . . , A) it can be shown that R(f n (- '■ A)) 
depends monotonically on A which makes it amenable to mathematical analysis. 
If we now perform both procedures and then choose at the end the smoothest of 
the two solutions we have a procedure which can be analysed. We have not yet 
encountered a data set where the result of the second procedure with equal weights 
was chosen. We point out that solving ([2]) for this form of A is equivalent to solving 
(PQ) but with A -1 in place of A. The second procedure therefore does the following. It 
considers the one-dimensional family of solutions of (CQ) and chooses the smoothest 
such function which lies in A n . This is an alternative to choosing the smoothing 
parameter by cross-validation or likelihood methods. 

3.2 An illustration 

We apply the procedure to the thin film data of Figure [U The value of o n of (jHJ) is 
8.3868. With n = 7001 and r n (a) = 3 we have 

(7 n Vr n (a) logn = 8.3868^/3 - log 7001 = 43.22 

The upper panel of Figure [3] shows the resulting curve: the lower panel shows the 
associated values of the Aj on a logarithmic scale. It is noticeable that the values of 
the Aj are large in the neighbourhoods of the large peaks and small outside of these. 
The manner in which the curve alters in the course of the iterations is shown on a 
larger scale in Figure HI The rows show the results after 1, 15 and 25 iterations and 
the final result after 33 iterations for the first 1000 observations. In each case the 
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left panel shows the curve and the right panel the weights A, on a logarithmic scale. 
Initially the weights are constant with a value of 2.9 • 10~ 8 . After 15 iterations they 
are still constant but now with the common value 2.5 • 10~ 4 . After 25 iterations the 
smallest weights are 7.4 • 10~ 3 and the largest is 0.18. The smallest weights for the 
final curve are still 7.4 ■ 10~ 3 but the largest weight is now 20. The values of the Aj 
differ by a factor of 30000. The final row shows the advantage of the local weights 
Aj. Where the data can be fitted with a smooth curve the A; are small and the fit is 
smooth. Where there is a pronounced peak the values of the Aj are large and this 
forces the solution of §T§ to adjust to the peak. 



4 Examples and simulations 



4-1 The thin film data 



The estimators we consider a re the weighted smoothing spline (wss), the spatially 



adaptive spline method due to 



Ruppert and Carroll! (120001 ) and the standard smooth- 



ing spline (smspt) with smoothing parameter chosen by cross validation. The Ruppert- 



Carro ll method uses so called 'penalized splines' which are the p-splines of 



(119961 ) (see also 



O' Sullivan! ( 1986 



Eilers and Marx 



19881 )). In contrast to smoothing splines they use 



a spatially weighted penalty term with the weights being determined by generalized 
cross-validation. The method is not fully au tomatic and requires the sp ecification 



of the maximum number of knots. Based on iRuppert and Carroll! (l2000l ) the num- 
bers we choose are 40, 80, 160 and 320: we denote the corresponding estimators by 
pspl40, pspWO, pspll60 and pspl320. Figure E] shows the results for the complete data 
set. It is seen that the peaks are satisfactorily captured only by the wss, pspl320 
and smspl reconstructions. Figure [6] shows the results for the first 1000 observations 
only for these three methods. Only the wss succeeds in capturing the peaks and 
giving a smooth reconstruction between the peaks. 
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20 30 40 50 60 70 80 




20 30 40 50 60 70 80 



Figure 3: The upper panel shows the results of applying the weighted smoothing spline procedure 
to the thin film data. The lower panel shows the values of the Aj on a logarithmic scale. 



13 



P.L. Davies and M. Meise 




16 18 20 22 24 16 18 20 22 24 



Figure 4: The rows show the results after 1, 15, 25 and 33 iterations of the weighted smoothing 
splines procedure to the first 1000 data points of the thin film data. The left panel 
shows the curve and the right panel the weights Ai on a logarithmic scale. 
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50 60 



30 40 50 




50 60 70 






50 60 70 



Figure 5: The top row shows from left to right the wss and psplJ^O reconstructions, the centre row 
the results for the pspl80 and pspll60 reconstructions and the bottom row the results 
for the pspl320 and the smspl reconstructions. 



15 



P.L. Davies and M. Meise 




16 



Approximating Data with weighted smoothing Splines 



4-2 Some simulation results 



We giv e the results of a small simulation study using the functions of lRuppert and Carroll 



(boom 



f< \ f< ^ ATI T • ( Ml + ^ )/b) 

f(x) = f(x;j) = y/x(l - x) sin I ——^—^ 



with j = 6 and the bumps data of iDonoho and Johnstone! (119951 ). We consider 



signal to noise ratios of 3 and 7. The tables gives the median (MRISE) of the root 
integrated square error 

1 \ 1/2 



RISE(/,/ n )= (7 (f(t)-f n (t)) 2 dt\ 



for the fit itself and the first and second derivatives MRISE(/W, fn^), i = 1,2. The 
results are based on 1000 simulations. 

We expect locally adaptive methods to perform better when the signal exhibits 
large changes in local variability and the signal to noise ratio is large. This is borne 
out by the results. The local variability of the Ruppert-Carroll is not large and there 
is not much to choose between the four methods wss, pspl80,pspll60 and smspl both 
in the low and high signal to noise scenarios. However the RMISE often disguises 
clear differences in the behaviour of the estimators. Figure [7] shows a typical result 
for the high signal to noise regime for the Ruppert-Carroll-function and a sample 
size n = 1600. 

The local variability of the bumps data is much more pronounced and the wss 
estimator outperforms the other estimators in all cases. 
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a = 0.288/7 ^ 0.0411 




Fit 

400 800 1600 3200 


First derivative 
400 800 1600 3200 


Second derivative 
400 800 1600 3200 


wss 

pspl40 

pspl80 

pspll60 

smspl 


0.030 0.021 0.016 0.012 
0.062 0.058 0.057 0.056 
0.027 0.021 0.017 0.015 
0.022 0.016 0.012 0.009 
0.024 0.016 0.012 0.009 


0.267 0.161 0.096 0.057 
0.559 0.388 0.274 0.192 
0.339 0.218 0.145 0.101 
0.244 0.148 0.090 0.054 
0.294 0.139 0.080 0.047 


3.98 1.88 0.85 0.37 
6.48 3.33 1.69 0.85 
5.26 2.69 1.32 0.67 
3.87 2.12 1.14 0.56 
4.77 1.77 0.79 0.37 


a = 0.288/3 « 0.096 




Fit 

400 800 1600 3200 


First derivative 
400 800 1600 3200 


Second derivative 
400 800 1600 3200 


wss 

pspl40 

pspl80 

pspll60 

smspl 


5.60 2.71 1.22 0.54 
6.49 3.34 1.69 0.85 
5.59 2.79 1.35 0.68 
5.19 2.59 1.25 0.62 
5.48 2.43 1.11 0.52 


0.417 0.244 0.150 0.093 
0.567 0.392 0.275 0.193 
0.414 0.256 0.159 0.106 
0.387 0.242 0.142 0.083 
0.401 0.232 0.136 0.080 


5.60 2.71 1.22 0.54 
6.49 3.34 1.69 0.85 
5.59 2.79 1.35 0.68 
5.19 2.59 1.25 0.62 
5.48 2.43 1.11 0.52 



Table 2: Values of the MRISE based on 1000 simulations for the Ruppert and Carroll 
function with j — 6. 
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a = 2.2/3 « 0.733 




Fit 

400 800 1600 3200 


First derivative 
400 800 1600 3200 


Second derivative 
400 800 1600 3200 


wss 

pspl 40 
pspl 80 
pspl 160 
smspl 


0.80 0.69 0.56 0.43 
1.55 1.54 1.52 1.52 
1.18 1.15 1.14 1.14 
0.84 0.81 0.79 0.78 
1.14 0.91 0.84 0.65 


18.9 18.4 14.7 10.6 
31.1 27.6 21.9 16.5 
29.1 26.4 21.2 16.0 
24.1 23.9 19.8 15.1 
28.8 24.9 20.1 14.4 


637 788 761 642 
900 959 860 693 

889 957 860 693 
811 942 857 692 

890 951 858 690 


a = 2.2/7^ 0.314 




Fit 

400 800 1600 3200 


First derivative 
400 800 1600 3200 


Second derivative 
400 800 1600 3200 


wss 

pspl 40 
pspl 80 
pspl 160 
smspl 


0.44 0.35 0.25 0.18 
1.54 1.53 1.52 1.52 
1.14 1.13 1.13 1.13 
0.74 0.76 0.76 0.77 
1.10 0.86 0.81 0.62 


11.9 11.2 9.3 7.2 
31.1 27.6 21.9 16.5 
29.0 26.3 21.2 16.0 
23.4 23.7 19.7 15.1 
28.7 24.8 20.1 14.3 


417 522 570 539 
900 959 860 693 
889 957 860 692 
801 941 857 692 
889 950 858 690 



Table 3: Values of the MRISE b a sed o n 1000 simulations for the bumps function of 



Donoho and Johnstone! (119951 ) 
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5 Heteroscedasticity and robustness 
5.1 Nonparametric scale approximations 

The ideas developed in the previous section can also be used to obtain nonparametric 
approximations to heteroscedastic noise. The model we use is 

Y(t) = a(t)Z(t), 0<t<l, (11) 

Z(t) Gaussian white noise. Given data (tj, Y(ti)), i — 1, . . . , n, we define a confidence 
region as follows. We define for a function s : [0, 1] — > (0, oo) and an interval 
/ C [0, 1] 

v(Y n ,i,s) = Y,y^) 2 /s(t i f 

and then set 

C n (Y n ,l n ,a n ) = {s:qn((l-a n )/2,\I\)<v(Y n ,I,s) 

<qu((l + 0/2,|/|), lel n } 

where qu(7, k) denotes the 7-quantile of the chi-squared distribution with k degrees 
of freedom. The rationale is clear. Under the model (fTTj) the v(Y n ,I,a) has the 
chi-squared distribution with \I\ degrees of freedom. By an appropriate choice of a n , 
which may be determined by simulations, C n (Y n ,T n ,a n ) is an a-confidence region 
for o: 

P{a E C n (Y n ,X n ,a n )) = a 

so that the confidence region is uniform, exact and non-asymptotic. Furthermore in 
this particular model there are no "nuisance" parameters corresponding to the a of 
model (jHJ). The default value of 7„ we use is 

7 n = 1 — exp(— 1.5 log(n)) = 1 — tiT 1 ' 5 
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which roughly corresponds to the default choice of r n = 3 in the definition of A n . 
As before the second step is to regularize in C n (Y n ,T n , a n ). One possibility which 
is useful for quantifying the changes in volatility of financial data, the volatility of 
the volatility, is to take s t o be pi e cewise constant and to minimize the number of 
intervals of constancy (see iDaviesI (120061 )). In the present context however we are 
looking for a smooth approximation and we take recourse to weighted smoothing 
splines. We take s = s n to be the solution of 



mm 



n „i 

Y^HM -s n {u)) 2 + \ 

i=l J o 



] (t) 2 dt. 



where again the local weights are data dependent and are chosen so that the solution 
s n lies in C n (Y n ,T n , a n ). The procedure we use is similar to that described in Section 
13.11 but with some modifications. On intervals / where the inequality 



qu(( 



1 - 0/2, |/|) < v(y n , /, s n ) < qu((l + a n )/2, 



(12) 



is not satisfied we increase the weights by a factor of q but we do this firstly for 
single observations, that is intervals of length one. When ( fl2|) is satisfied for all 
such intervals we consider intervals of length two. When again all the inequalities 
are satisfied we move on to the next long er intervals until finally a ll inequalities are 
satisfied. A similar procedure was used in lDavies and Kovad (120041 ) in the context of 
approximating spectral densities. Figure M shows the result of the procedure applied 
to data generated according to the model 



Y(t) = sin(47rt) 2 Z(t). 
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Figure 8: Top panel: heteroscedastic noise. Bottom panel: the scale function and its reconstruc- 
tion using weighted smoothing spline.. 
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5.2 Robust smoothing 

A complete robustification of the procedure described in Section 13.11 would entail 
replacing (j2J) by, for example, 

n „1 

m.mS{g,\):=S^\ i \y{t i )-g{t i )\+ / g {2) (t) 2 dt, 
i=i Jo 

and the definition of approximation (J4]) by 

w(y n ,I,g) = -j== ^2sgn(r(y n ,ti,g)) 
V\ I \ uel 

to give rise to the confidence region 



V n (y n ,l n ) = {g : max w(y n ,I,g) < ^2 logn } 



lex, 



sec 



Diimbgen and Kovad (120051 )). A much simpler but reasonably effective method 
is the following. The noise level a n is quantified by (jHj). A running median with a 
window width of say five observations is applied to the data 

m 5 (ti) := MED(y(ti- 2 ),y(ti-i),y(ti),y(ti+i),y(t i+ 2)) 

and any data point y(ti) for which 

\y(ti) - m 5 (ti)\ > 3.5a n , 



is replaced by m 5 (tj) (see lHampell (119851 )). The weighted splines procedure is now 



applied to the cleaned data set. The procedure will work well as long as no group 
of five successive observations contains more than two outliers. Figure [9] shows 
the result of applying this robustified procedure to a sine curve contaminated with 
Cauchy noise. 
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;ure 9: The robustificd weighted spline procedure applied to a sine curve contaminated with 
Cauchy noise 
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6 Image analysis and weighted thin plate smoothing splines 
6.1 Weighted thin plate smoothing splines 

We consider data y n = {(tj, : i = 1, . . . , n 2 } with the tj of the form 

ti = (ji/n, ki/n), ji, ki = 0, . . . ,n - 1. 
Corresponding to ([2]) we consider minimizing 

n 2 

5(/,A):=^A(t J )(y(t t )-/(t J )) 2 + ^(/) 
i=l 

with 

It can be shown tha t the solution is a natural thin plate spline. We refer to 



Green and Silverman! (119941 ) . 



6.2 Approximation in two dimensions 

For a given function g : [0, l] 2 — > IR and a family Q n of subsets G of [0, l] 2 we define 
w{y n ,G,g) = — = ^2(y n (ti) - g(ti)). 



For data generated by the model 

y(t) = /(t) + <7Z(t), te[o,if 

this leads to the confidence region 
H*(Y n ,g n ,r) 



{g: max \w(Y n ,G,g)\ < a n a/2t log(n) }. 
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The additional factor 2 is due to the fact that we now have n 2 observations. The 
noise level a n is defined by 

1.48, 



-MED({|y(^±i 



2 

fit ki + 1 . ,ji k 



" * n ' n 



J + 2/1-, —J I : i = 1, • • • , n 2 

n n n n 



The quality of the results depends on the choice of Q n . If Q n contains too few sets 
then the concept of approximation is too crude. Consequently we require a fine 



to be efficient 



Friedrich et al. 



y calculated. 



(120071) • The 



division of [0, l] 2 but one which allows the w(y n ,G, g] 
Work in this direction has been done and we refer to 
family Q n we use is the set of all squares. 

6.3 An example 

As a simple example we consider the function F : R 2 — >■ R 

F(x,y) = 10exp(-:r 2 - 2y 2 ) 

on a 50 x 50 grid on [—7, 4] 2 with added normal noise, E{ ~ N(0, 1). Figure fTUl shows 
the function F and its contaminated version together with the thin plate splines 
reconstruction using generalized cross-validation and the weighted smoothing spline 
method. The main drawback of weighted thin plate splines is the numerical difficulty 
of calculating them for larger grids. 

7 Asymptotics 

7.1 Weighted smoothing splines 

Weighted smoothing splines may be seen as a heuristic method for solving 

min R(g) s. t. g G A(Y n , a n , X n , r n ). (13) 
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Figure 10: Top row: original function (left) and the noisy data (right). Bottom row: thin plate 
approximation using GCV (left) and the automatically weighted version (right). 



28 



Approximating Data with weighted smoothing Splines 



The resulting function f n is denned by an algorithm and in the absence of a proof 
that it yields at least an approximate solution, that is 

f fi 2 \t) 2 dt < K min I g (2) (tfdt 

JO g£A{Y n ,(T„,I n ,T n ) JO 

for some constant K > 0, we can either establish a rate of convergence on this as- 
sumption or we can try and analyse the algorithm. In the first case we are lead to 
a rate of convergence in the supremum norm of order (log(n)/n) 3//8 . Analysing the 
algorithm as it stands would essentially involve proving that it solves the minimiza- 
tion problem at least approximately. We therefore analyse a modified version of the 
procedure. We assume that the design points are of the form ti = i/n and that the 
data are generated as in ([31) with 



/ (i) (t) _ / (D (0) = / f W( u ) dUi (14) 



o 



f {2 \u) 2 du < oo. (15) 







For a given function g we denote the vector of values of g at the design points by 
g n . We consider firstly the case of a global A and denote the solution of ([2]) with 
Ai = . . . = A n = A by / n (A). It can be shown that f n (X) is a solution of 

n 

min S x (g n ) := ^ \(Y(ti) - g n (t t )) 2 + g t n n n g n (16) 
i=i 

where fl n is an n x n-non-negative definite matrix with normalized eigenvectors e ni 
and corresponding eigenvalues j n {, 1 < i < n with 7 n i = 7„2 = 0. The remaining 
eigenvalues satisfy the inequalities 

i A i 4 
ci— < 7™ < c 2 — , 3 < i < n (17) 
n n 



with the constants C\ and c 2 being independent of n (see lUtrerad (119831 )). For an 



interval J C {1, . . . ,n} we denote by 9i the vector whose elements 9i are 1/\J\J\ 
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for i e J and otherwise. We see that \\8j\\ = 1 and for the solution f n (X) of ffTHT) 
the w(Y n , I, /„(A)) of (JU) are given by 

w(y n , i, f n (\)) = ej (I) i\Y n - / n (A)), / e x n 

where J (I) is the interval of {1, . . . ,n} which gives the indices % with e /. We 
have 

Theorem 7.1 

(oj f n (\)tt n f n (\) is an increasing function of X. 

(b) E (j^(A)fi n / n (A)) < cn 1 / 4 A 5 / 4 /or some constant c. 



(c) There exists a constant A > such that for all A > A/logn and for all X n 
with \X n \ < qn for some fixed q and for all r > 2 we have 



lim P(maxHy n ,;,/ n (A))| < a^/rlogn ) = 1. 
Proof, (a) In the following I n denotes the identity matrix. The solution / n (A) of 



ffTol) is given by 

~f n (X) = X(XI n + Q n )- 1 Y n 
and on writing Y n = J2i=i Vni e ni we obtain 



a ( A + ^) 2 

from which the claim follows on noting that 7„j > for % > 3. 
(b) We have 
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and hence 

E (/i(A)njF B (A)) 
= x 2 f t n (\i n + n n )- 1 n n (\i n + n n )- 1 f n 

+a 2 E (A 2 Z'„(A/ n + O n )- 1 fi n (AI n + On) -1 ^) . 
Arguing as above we obtain 

A 2 /^(A/ n + fi n )- 1 fi n (A/ n + fi n ) _1 /n 



2 7m 



n 

< ^nCIni = fn^fn 

3 

and 

e(a 2 z'„(a/„, + n n )- 1 n n (AJ n + n n )~X) 

n 

■> 2 \ ^ 7m 

On splitting the last sum into two parts, from i — 3 to i — n 1 / 4 A 1 / 4 and from 
to i = n and on using (|T7|) we see that 

E(A 2 Z'„(A/„, + Q n ) _1 ^n(A/ n + tt n )- l Z n ) < cn^X 5 ' 4 

for some constant c. 
(c) We have 

Y n ~ /„(A) = (AJ n + fi,,)- 1 ^^. 
and on writing V n — f n + Z n we obtain 

- / n (A) = h n + 5 n 

with 

— (A/n + 1 ^nf n , 

S n — c(A/ n + ri n ) 1 vt n z n . 
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On writing f n = J^" a ni e ni we obtain 



n 

Ini 
(A + 7m) 



and hence 



im 2 = £ 



3 



n 2 
2 7ni 



,, (A + 7ni) 



"'/\ - \2 

2 - n 



A 4- m (1 + 7^/A) 2 " A 4^ m7m ' 



3 v ~ ' '"" ' v 3 

As fn^nfu = ^3 a nilm we see that at least asymptotically 



\\h II 2 <-f t Q f 

1 1 "-tx 1 1 — ^ J n^ L nJ n - 

We turn to 5 n . We write Z n = J2i ^ni e m where, because of the transformation is 
orthonormal, the Z* { are i.i.d. standard Gaussian random variables. It follows 

n 

and on writing Qj = Y^i Qni^-ni we obtain 

n / \ 2 

mow) = (yt-) <- 2 - 

3 V i Ini / 

The claim of the theorem follows from the usual upper bound for the tail of a Gaus- 
sian distribution. □ 



We consider the following modified procedure. We consider the solutions / n (A) 
of ffTBl) and determine the smallest value of A for which / n (A) G A(Y n ,l n , r n ). It 
follows from (c) of Theorem 17.11 this smallest value is asymptotically with arbitrarily 
large probability smaller A/ logn. If we denote this solution by / n (A*) then (a) and 
(b) of Theorem 17.11 imply 

lim lim F(~f n (\* n ynJ n (\* n ) < cn^ihgn)- 5 ^) = 1. (18) 
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for some c > 0. Let f n {\) be the solution obtained from the weighted smoothing 
spline procedures as described in Section |3~T1 respectively. If 

~f n (\ynj n (\) < ~f n (\* n ynj n (\* n ) 

then we accept f n (X) and otherwise we accept / n (A*) and denote the solution by 
/*. We have 

Theorem 7.2 /// satisfies ( fl^p and [T5\) and if S n is such that 

lim 5 n n 5/w {logn)- 9/w = oo 

then 

sup |/:(t)-/(t)| = P ((logn) 7 / 32 ^ 11 / 32 ). 

8 n <t<l—5 n 



Proof. For a function g satisfying the conditions (THl) and (1151) we have 

^(t + «)-^(t) = I* g {1 \t + u)du = sgU(t)+ f\gV{t + u)-gV{t))du 



_ ...^.'(fU / ( / „i2) 

and hence 



/ (j (? (2) (t + w)^ ) du 



As 



| + s) - g(t) - sgW (t) | < j* QT |^( 2 ) {t + «) | dv ) du. 



\g^(t + v)\dv) = ( [\v <u}\g {2 \t + v)\dv 



/ \J0 



< [\v<u} 2 dv [ g {2 \t) 2 dv = u I g {2 \t) 2 dv 
Jo Jo Jo 

for u with t + u < 1 by Cauchy-Schwarz we obtain 

\g{t + 8)-g(t)-8g<»{t)\ < f u 1 ' 2 ( t g^ 2 \t) 2 dv) ' du 



\J0 



^ 3/2 ( / 9 {2) (t) 2 dv 
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On combining this with the corresponding inequality for \g(t — s) — g(t) + sg^'(t)\ 
we conclude 



\g(t + s)+ g(t -s)- 2g(t) \ < ^ 2 ( j' g^ {uf du) ^ 



(19) 



At this point to simplify the proof we assume that I n is the family of all intervals 
of the form [tj, tj]. The only effect of taking X n to be the dyadic set of intervals is 
that the constants in the estimates below are somewhat larger. Consider now point 
tj = j/n and the interval Ij t k = [tj-k, tj+k\- As /* lies in X n we have 

k / / . .s 

J + 2 

-k 



n 



-f 



j + i 



n 



< a n y/r n log 



n. 



(20) 



We intend to use (fl9l) with g = g n = f* — f ', t = j /n and s = i/n. Firstly we note 
that for this g it follows from ffl5l) and (fl~8j) that 

qW(t) 2 dv = P (n^Oogn)- 5 / 4 ) . 

From this and (TTHD we deduce 



n 



n 



./ 



j + i 



n 



f 



J ~ * 



n 



+ R n 



with 



3/2 



R n = (^J Op(nV>gnr 5 / 8 ). 
On using this in (1201) we obtain after a short calculation 

^* (i) - (^) | ^ (!) 3/2 °- (« i/s co g «)- 6/s ) + 



a, 



r„logn 



2k 



As / is continuous it is easy to prove that lim n ^ 00 o n = a and as, as already noted, 



lim n ^ 00 r n = 2 we deduce 



/:(-)-/(- 



<0, 



^y /2 n i/8 (logn) ~5/8 + 



logn 
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The result follows on choosing k = n 11//16 (logn) 



9/16 



□ 



We note that for the solution /„ of ([TBI we have 

f fi 2) (tfdt< f f 2 \tfdt. 
Jo Jo 

This means that we can replace the term Op (n 1/ ' 8 (logn) _5/ ' 8 ) above by Op(l). The 
same argument now leads to the rate of convergence (log n/n) 3 / 8 mentioned above. 



7.2 Weighted thin plate smoothing splines 

The method of prove can be extended to obtain an analogous result for weighted thin 
plate smoothing splines. As the calculations are somewhat longer we only indicate 
how to do this. The estimates ( fTTI) are replaced by 



i 2 i 2 

C\— < Jni < C2 — , 

n n 



3 < i < n 



with the constants C\ and Ci being independent of n (see lUtreraj (119881 )). From this 
the same method of proof used for Theorem 17. II leads to a corresponding result. The 
family 1 n is taken to be the family of squares and now a two-dimensional version of 
the argument leading to Theorem 17.21 gives the result. 
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